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A.  Background 

In  [1]  a  method  was  developed  for  solving  separable  nonconvex  op- 
timization problems.  The  method  can  be  viewed  as  an  extension  of  the 
Dantzig  Wolfe  convex  programming  algorithm  to  nonconvex  problems.  The 
extension  involves  imbedding  the  Dantzig  Wolfe  approximating  linear 
programs  in  a  branch  and  bound  algorithm.  At  each  stage  of  the  branch 
and  bound  search  a  restricted  master  approximating  linear  program  is  solved 
over  a  subset  of  the  original  feasible  region.  Dual  variables  from  the 
L.P.  solution  are  used  in  (nonconvex)  single  variable  Lagrangian  sub- 
problem  minimizations  which  price  out  new  trial  solutions  for  the  master 
L.P.  The  results  of  the  Lagrangian  subproblem  solutions  are 

a)  a  global  optimal ity  test 

b)  (perhaps)  new  columns  for  the  master  L.P. 

c)  (perhaps)  a  branch  in  the  branch  and  bound  algorithm  to  further 
partition  the  feasible  region. 

Details  of  the  method  are  given  in  [1],  and  an  extension  to  e-optimality 
is  discussed  in  [2]. 

The  purpose  of  the  paper  is  to  show  that  essentially  the  same  method 
can  be  applied  to  nonseparable  nonconvex  optimization  problems.  The 
changes  involve 

a)  the  Lagrangian  subproblem  minimizations  no  longer  decompose  into 
single  variable  minimizations. 

b)  the  branching  rules  for  branch  and  bound  are  modified.  The  new 
rules  are  a  significant  improvement  and  would  probably  be  advan- 
tageous in  the  separable  case  also. 

The  primary  advantage  of  the  original  method,  which  is  retained  in 
the  extension  of  this  paper,  is  that  nonconvexity  only  needs  to  be  directly 

1 


considered  in  the  essentially  unconstrained  Lagrangian  subproblems. 

The  efficiency  of  various  methods  for  such  subproblems  has  been  considered 

in  [3]. 

The  extension  is  "formal"  in  the  sense  that  although  all  the  mathe- 
matical operations  are  valid  and  capable  of  routine  computer  implementation, 
the  convergence  of  the  algorithm  for  arbitrary  nonconvex  problems  has  not 
yet  been  demonstrated. 

The  report  is  intended  to  be  read  along  with  [1]  although  some  material 
presented  there  has  been  restated  here  in  nonseparated  form  for  clarity. 

B.  Preliminary  Results 

We  consider  the  general  possibly  nonconvex  bounded  nonlinear  optimization 
problem 

NLP  min  f(x)  (1) 

subject  to    g.(x)  £  0    i=l,...,  m 
a.  £  x.  ^  b.  j=l ,. . . ,  n. 

J       J       J 

Let  C  =  {x|a,  £  x,  <:  b.  \/.}  .  (2) 

j    j    j   j 

Consider  the  linearization  of  NLP  obtained  by  selecting  vectors 
x.cC  ,  k=l ,  2,  ...,  r,  and  defining  for  each  x.  a  convex  combination 
weight  variable  A,  ^  0  with  T  A  =  1  .  Let  P.  be  the  restricted 
master  approximating  linear  program  defined  as 

Px        min  I     Akf(xk)  (3) 

r 

subject  to  £  A.g.(x.)  <£  0    i=l,...,  m 
k=1  k  1  K 

r 

I  A  =  1 
k=l  K 

\  :>  0         k=l ,. .. ,  r 
2 


If  A*  is  the  primal  optimal  solution  to  the  linear  program,  then 


x*  =  I   A.*  x.  (4) 

k  K   K 


defines  an  approximate  solution  point  for  the  original  NLP. 

If  NLP  is  a  convex  program,  then  the  relationships  between  NLP 
and  P.  are  well  understood.  These  relationships  have  been  used  to 
develop  a  column  generation  algorithm  which  uses  the  dual  variables  from 
P,  in  Lagrangian  subproblems  to  iteratively  find  new  vectors  x.  and 
hence  new  columns  for  the  restricted  master  P,  .  This  method,  which  can 
be  viewed  as  the  nonlinear  analogue  of  the  Dantzig  Wolfe  Decomposition 
Principle  has  been  proved  to  converge  in  [4]  . 

In  the  nonconvex  case  the  P,  and  NLP  relationships  are  not  as 

A 


simply  understood.  In  particular,  P,  sometimes  underestimates  a 


nd 


sometimes  overestimates  the  original  NLP  .  Nevertheless  the  P,  lineari 

zation  can  be  used  in  an  algorithm  for  solving  NLP  . 

Suppose  that  ttgR  ,  aeR   are  the  dual  variables  for  P,  ,  and 

define  the  Lagrangian  function  for  NLP 

m 
L(x.tt)  =  f(x)  -  I   Tr.g.(x)  .  (5) 

1=1  1  1 

The  dual  problem  to  P,  is 

Dual    max  a  (6) 

subject  to  I   Tig1(xk)  +  o  ^  f(xk) 

it  ^  0  ,  a  unrestricted 
which  can  easily  be  rewritten  as 

max   min    L(x.,tt)  .  (7) 

tt<;0    k=l,...,r 


If  the  inner  minimization  were  over  all  x  z   C  then  (7)  would  be  the 
standard  Lagrangian  dual  for  NLP. 

Lemma  1  (Lower  bound) 

x*  feasible  for  NLP,  it  ^  0  => 

min  L(x,tt)  <;  f(x*)  .  (8) 

xeC 

Proof 

min  L(x,ir)  <;  L(x*,tt)  =  f(x*)  -  J  7r.g.(x*)  ^  f(x*) 
xeC  i  1  1 

since  ir.  ^  0  and  g.(x*)  ^  0  .□. 

Suppose  we  solve  P,  obtaining  optimal  primal  variables  A  and 

optimal  dual  variables  -rr,a  with  objective  function  value  Z  (=a)  . 

Let  x  globally  solve  the  nonconvex  Lagrangian  subproblem 

min  L(x,tt)  ,  (9) 

xeC 

and  let  x*  =  Y  A.  x.  as  in  (4). 

Theorem  1  (Optimal ity  test) 
If  a)  f(x*)  <;  Z 

b)  g^x*)  <;  0 

c)  L(x,tt)  ^  a 

then  x*  is  globally  optimal  for  NLP. 

Proof 

/\ 
1  =  o  <.  L(x,tt)  <.  min{f(x)       x  feasible}  <.  f(x*)  ^  Z 

for  NLP 

Thus  x*  solves  NLP  .  .□. 

A  version  of  this  theorem  which  allows  for  tolerances  in  conditions 

a),  b),  c) ,  and  the  Lagrangian  minimization  and  which  implies  e-global 

optimal ity  can  be  easily  developed  as  in  [2]  . 


The  primary  value  of  Theorem  1  is  that  if  optimal ity  is  not  achieved, 
then  it  suggests  further  actions  for  the  optimization  algorithm.  In  par- 
tfcular,  if  c)  is  violated,  then  the  vector  x  generates  a  column 
[f(x)>  g(x),  1]  with  negative  reduced  cost  in  the  simplex  tableau  for 
PA  .  Thus  x  should  be  incorporated  as  a  new  x,   point.  If  a)  or 
b)  is  violated,  then  the  NLP  is  not  convex  at  the  current  convex  combina- 
tion x*  .  In  this  case  the  algorithm  must  resort  to  branch  and  bound 
to  enforce  a  different  convex  combination  which  (hopefully)  does  not  violate 
convexity  so  badly.  In  [1]  the  choice  of  variable  on  which  to  branch  was 
simple  and  depended  on  the  separated  components  f.  ,  g. .  of  the  separable 
functions  f  and  g. ,  where  for  example, 

f(x)  ■  I   f,(x.)  (10) 

j=l  J  J 

The  major  point  of  the  paper  is  that  reasonable  selection  rules  can  be 

developed  even  in  nonseparable  cases.  Before  developing  these  rules  we 

state  the  entire  algorithmfor  the  nonseparable  case  in  detail. 

C.  The  Algorithm 
Step  1  Initialization 

Choose  an  initial  set  of  vectors  x,  e  C  .  Let  P   with  t=l 
(  =  subproblem  counter)  be  the  P,  program  corresponding  to  this  initial 
set.  Let  C.  =  [a.,  b.]  .  Let  L  =  -  °°  be  the  current  largest  lower 

J  J     J 

bound  for  P,  .  Let  F°  =  +»  be  the  value  of  f(x)  for  the  best  incum- 
bent feasible  solution  to  NLP  found  so  far.  Place  P.   on  a  list  of 
subproblems  and  go  to  step  2. 

Step  2  Linear  Program 

If  the  list  of  subproblems  is  empty,  stop--the  incumbent  solution  is 
global  optimal.  Otherwise  select  a  problem  P    from  the  list  (see 
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discussion  in  section  D.)  and  solve  it  yielding  optimal  value  Z  with 
optimal  primal  variables  X  and  optimal  dual  variables  tt.cf  . 
(If  P.   is  infeasible  then  the  method  is  slightly  modified.  See  [1]  for 
details.)  Go  to  step  3. 

Step  3  Lagrangian  Minimization 

Solve  the  nonconvex  Lagrangian  problem  min  L(x,tt)  giving  solution 

xeC 

A  *  o  t 

x  .  Let  B  =  L(x,tt)  .  If  B  ;>  F   then  fathom  P    and  go  to  step  2. 

If  F°  >  B  >  L  then  increase  the  value  of  the  bound  for  P,  to  L  =  B 

and  go  to  step  4.  Otherwise  go  to  step  4  without  changing  the  bound. 


Step  4  New  Grid  Points 

A. 

n     iico        v        +<->    nonowto    a     noui    rnliimn     fnv       E 


If  L(x,tt)  <  a  then  use  x  to  generate  a  new  column  for  P. 


Place  the  new  P,   on  the  list  and  go  to  step  2. 
If  L(x,it)  ^  a     then  go  to  step  5. 

Step  5  Optimal ity  Test 

Compute  x*  from  A  using  (4).  If  g. (x*)  ^  0  ,  i=l,...,  m,  and 
if  f(x*)  <  F°  then  replace  F°  with  f(x*)  and  let  x*  be  the  new 
incumbent  solution. 

If  a)  f(x*)  ^  Z  and  b)  g.(x*)  ^  0   vi  ,  then  x*  is  global 
optimal  for  the  NLP  subproblem  over  x  e  C  .  Go  to  step  2. 

If  a)  or  b)  is  violated,  go  to  step  6. 

Step  6  Branch 

Use  x*  to  generate  a  new  column  (nonbasic)  for  P.   .  Select  a 

coordinate  x.*  j=l,...n  of  x*  (see  discussion  in  section  D.)  and 

branch  creating  two  new  subproblems 


a)  P,   restricted  to  x.  ^  x.*  (include  only  those  columns  for 

A  J     J 

which  (x.  )  .  £  x.*)    and 

K  J     J 

b)  P.   restricted  to  x .  ^  x  .*  (include  only  those  columns  for 

a  J    J 

which  (xk)  .  £  x.*) 

Compute  a  bound  L   for  each  of  these  subproblems  and  place  both  on 
the  list.  Go  to  step  2. 


D.  Branch  and  Problem  Selection  Rules 

Two  aspects  of  the  method  remain  to  be  described:  the  rule  for 
selecting  the  next  subproblem  P.   to  examine  in  step  2  and  the  rule 
for  selecting  the  component  x.*  of  x*  on  which  to  branch  in  step  6. 
In  this  section  we  show  that  penalty  calculations  can  be  used  for  these 
decisions.  The  penalties  used  were  originally  considered  in  the  context 
of  integer  programming  and  have  been  applied  to  separable  nonconvex 
optimization  using  special  ordered  sets  in  [5].  In  fact,  neither 
separability  nor  the  ordered  set  property  is  necessary  as  we  shall  show. 

Consider  the  following  linear  program 


mm  ex 
st     Ax  =  b  (11) 

x  ;>  0 

with  optimal  basis  B  and  optimal  solution 

xB  =  B_1b  j  xN  =  0  (12) 

Suppose  we  have  available  the  optimal  simplex  tableau  containing 
the  transformed  constraint  matrix 

T  =  B_1A  (13) 

and  also  the  row  of  reduced  cost  coefficients 


C  =  C  -  CBB_1A  (14) 

A  standard  result  of  post  optimal ity  analysis  is  that  if  we  force  a  basic 

variable  xR   to  zero,  and  let  the  remaining  variables  adjust  optimally, 
Bi 

then  a  first  order  approximation  to  the  resulting  objective  function 
change  is 

Q  =  x  (min      i-r^  )      •  <15) 

i  \ j  nonbasic  (  ij  )  / 

t.  ->0 
1j 

This  approximation  is  called  the  "penalty"  for  forcing  xR   to  zero. 

bi 

It  gives  the  exact  change  if  no  basis  changes  occur  before  x~   reaches 

zero,  and  is  otherwise  an  under-estimate. 

In  the  context  of  branch  selection  for  P,  in  step  6  of  the  algorithm 
we  wish  to  compute  a  penalty  for  each  component  x.  for  the  two  resulting 
subproblems.  Each  subproblem  involves  dropping  several  of  the  current 
vectors  x.  ,  or  equivalently  forcing  the  corresponding  variables  A, 
to  zero. 

Suppose  S  is  a  set  of  variables  which  we  want  to  force  to  zero 
if  basic  or  maintain  at  zero  if  nonbasic.  Then  the  penalty  for  this 
action  is 


I 


Bi 


Qs  =  max   I  xB   min 

xR  eS  (   i  |_  j  nonbasic  *  ij  ' 


ci 


(16) 


Xj  *  S 


In  the  context  of  P,  ,  then,  the  appropriate  penalties  are  obtained  from 
(16)  by  setting  S  to  be 
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for  subproblem  b)     of  step  6  and 

V  =  {xk  i  (xk'j  *  xo*}  (18) 

for  subproblem  a)  for  each  j=l,...,  n  . 

Intuitively  a  subproblem  with  a  large  penalty  is  unlikely  to  contain 
the  global  optimal  solution  to  NLP.  Thus  branch  selection  in  step  6  can 
be  performed  so  that  one  of  the  two  resulting  subproblems  is  most  likely 
to  contain  the  solution  by  selecting  j  to  satisfy 

min   Q*  (19) 

j    J 


or  possibly 


min  |  Q.+   -  Qj"  |  (20) 


In  either  case  the  choice  of  the  next  P,  to  work  on  in  step  2  can  then 
be  the  highly  likely  subproblem,  placing  the  unlikely  candidate  on  the 
list  and  hoping  that  it  will  be  fathomed  before  it  has  to  be  solved. 

It  should  be  emphasized  that  in  the  nonconvex  case  the  penalties 
yield  only  a  guide  and  not  guaranteed  bounds  on  the  new  subproblems. 
Thus  they  should  not  be  used  to  infer  new  bounds  L.  on  the  resulting 
subproblems  after  a  branch. 

As  in  all  branch  and  bound  procedures  the  actual  choices  of  sub- 
problem  and  branch  selection  rules  should  be  governed  by  computational 
experience. 
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